Based on the data exploration/visualization I conducted in a previous document, the following metrics were determined to be potentially good features to be used for examining individual differences:
In a separate document, I modeled eccentricity using different non-linear functions. Here, I will use (a subset of) the features mentioned above to examine individual differences using this dataset.
Given how much data there was for this VR project, our team decided to focus on examining individual differences in head movement.
Though there has not been a lot of work done in this area, one thing that has been brought up in the literature was the fact that individuals may be categorized into “movers” vs “non-movers” based on their head movement data. As such, I was tasked with testing this particular hypothesis by taking a machine learning approach.
Based on the features that were derived, we focused specifically on the following features:
Head movement amplitude was not considered as the measure was derived based on eye movement fixations and thus might be somewhat arbitrary. Head movement direction and angle were also not considered for similar reasons.
Though the x/y coordinates of individual head movements are generally somewhat tied to fixations, they exist within an overall region and as such, metrics that capture overall head movement patterns may suffer less from the issue mentioned above. Likewise, the head rotation data sampled in the experiment provide a rough range of the degree to which individuals rotate their head.
Note: For eccentricity, I used the coefficients by fitting the data using an asymptotic regression function, as it fit the data the best overall.
For this project, I focused on using model-based clustering for examining individual differences in head movements. The advantage of using this type of clustering method is that it uses Bayesian Information Criteria (BIC) to determine the optimal number of clusters/groups, based on available data. This differs from other clustering techniques (e.g., hierarchical clustering), which tend to be more “heuristics”-based in that there are no “hard numeric values” for determining the optimal number of clusters.
Load libraries:
library(mclust)
library(plyr)
library(ggplot2)
library(ggalluvial)
library(wesanderson)
library(dplyr)
library(corrgram)
library(openxlsx)
library(clValid)Load data:
#----------------------- load data ------------------
df<-read.csv("VR_Basic Scene Viewing_Summary Per Trial_Updated 11Nov2020.csv")
sacc.coef<-read.xlsx("Eccentricity Parameters from Non-Linear Models.xlsx", sheet = 1)
head.coef<-read.xlsx("Eccentricity Parameters from Non-Linear Models.xlsx", sheet = 2)
# --------------------------- vector variables for graphing ------------------------------
# colour palette for subject-level analysis
palette<-c(wes_palettes$Darjeeling1, wes_palettes$GrandBudapest1, wes_palettes$GrandBudapest2, wes_palettes$Moonrise3, wes_palettes$Moonrise2)Below are the features (and associated parameters) and how well they correlate with one another.
sum.head<- ddply(df, .(subject), summarise,
# head rotation
rot.avg=mean(head_rot_avg, na.rm=T),
rot.sd=sd(head_rot_avg, na.rm=T),
# SDE
# area
area.mean=mean(head.area, na.rm=T),
area.sd=sd(head.area, na.rm=T),
# theta
theta.mean=mean(head.theta, na.rm=T),
theta.sd=sd(head.theta, na.rm=T)
)
# merge the eccentricity parameters with it
head.sum<-merge(sum.head, head.coef[c("subject", "asym.max", "asym.int", "asym.lograte")], by="subject")
As seen in the correlation plot, there are some parameters that are highly correlated with one another (e.g., area.mean & area.sd, asym.int & asym.lograte). Since the cluster solutions would be pretty sensitive to the input, I tried a few different feature combinations (removing or retaining the highly correlated ones).
# standardize all variables such that they are all on the same scale
head<-scale(head.sum[, !(names(head.sum) %in% c("subject"))])Note: features should be standardized since we want the units to be somewhat comparable. In this case, I’ve simply standardized everything, but it may be better to consider keeping the parameters of each feature (e.g., mean & sd) on the same scale, while standardizing each feature (e.g., area vs rotation).
As shown in the figure, there seem to be 3 solutions. The “best” solution is almost always modeling each individual as a single cluster (or pairs of clusters), but since we’re interested in seeing whether we can extract meaningful individual differences, typically, we examine solutions for between 2-9/10 clusters.
Here, we see that the optimal solution between K of 1-10 is a 3-cluster solution.
One issue with the previous cluster solution is that it contained some fairly “redundant” measures. Reundancy can result in a model that is overfitted and also take longer for the algorithm to reach a solution (this latter issue is probably negligible given that there are relatively few observations/examples, but as the number of observation increases, feature reduction will become increasingly more important).
In this project, I will simply be removing one of the features. There are, however, other ways to reduce features (e.g., dimension reduction).
First I’m going to start off by removing the area.sd variable from the dataset.
# standardize all variables such that they are all on the same scale
head2<-scale(head.sum[, !(names(head.sum) %in% c("subject", "area.sd"))])
# plot of BIC values
plot(mclustBIC(head2, G = 1:21))
Compared to the previous solution, it seems that the optimal solutions are at K = 2, 13, and 21. Below is a closer look at the first part of the graph.
Note: I also tried removing the area.mean measure instead of the area.sd measure and the cluster solutions were fairly similar, K = 2, 13, 21
Next, I’ll be looking at what happens to the model when I remove asym.int from the data.
# standardize all variables such that they are all on the same scale
head3<-scale(head.sum[, !(names(head.sum) %in% c("subject", "area.sd", "asym.ini"))])
# plot of BIC values
plot(mclustBIC(head3, G = 1:21))
Compared to the previous solution, it seems that the optimal solutions are at K = 2, 13, and 21. Below is a closer look at the first part of the graph.
Interestingly, the cluster solution for removing asym.lograte differed from that of removing asym.int (model 3).
# standardize all variables such that they are all on the same scale
head4<-scale(head.sum[, !(names(head.sum) %in% c("subject", "area.sd", "asym.lograte"))])
# plot of BIC values
plot(mclustBIC(head4, G = 1:21))
Here, we’re getting local optimas at of K = 3, 14, 20
I created an alluvial graph to help compare and visualize across the different model classifications. As shown in the graph below, the results actually seems pretty consistent.
# create separate data frame for the classification
head.group<-data.frame(
subject = factor(head.sum$subject),
mod1 = factor(mc.head$classification),
mod2 = factor(mc.head2$classification),
mod3 = factor(mc.head3$classification),
mod4 = factor(mc.head4$classification),
Freq = 1
)
# construct alluvial graph
ggplot(head.group, aes(y = Freq, axis1=subject, axis2=mod1, axis3=mod2, axis4=mod3, axis5=mod4)) +
geom_alluvium(aes(fill = mod1)) +
scale_x_discrete(limits = c("Subject", "Full", "Area.SD\nRemoved", "Area.SD + Asym.Int\nRemoved", "Area.SD + Asym.Lograte\nRemoved"), expand = c(.2, .05)) +
guides(fill = FALSE) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
theme_minimal() +
scale_fill_manual(values=wes_palette("GrandBudapest1", 4))+
ggtitle("Comparison of Classification of Participants' Head Movement")## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
But what does the data look like? Below, I’ve categorized individuals’ data into groups based on the classifications in Model 2/3 since they were the most consistent models:
As shown in these graphs, it would appear that the classification is mainly on how sporadic the head movement pattern seems to be, rather than whether individuals were or weren’t movers.
df2<-read.csv("VR_Basic Scene Viewing_Summary Per Trial (Interior)_Updated 27Nov2020.csv")
head.coef2<-read.xlsx("Eccentricity Parameters from Non-Linear Models (Interior).xlsx", sheet = 2)Below are the features (and associated parameters) and how well they correlate with one another.
sum.head2<- ddply(df2, .(subject), summarise,
# head rotation
rot.avg=mean(head_rot_avg, na.rm=T),
rot.sd=sd(head_rot_avg, na.rm=T),
# SDE
# area
area.mean=mean(head.area, na.rm=T),
area.sd=sd(head.area, na.rm=T),
# theta
theta.mean=mean(head.theta, na.rm=T),
theta.sd=sd(head.theta, na.rm=T)
)
# merge the eccentricity parameters with it
head.sum2<-merge(sum.head2, head.coef2[c("subject", "asym.max", "asym.int", "asym.lograte")], by="subject")Below are the results for the best-fit solution. As observed here, the best cluster solution seems to be K = 2.
# standardize all variables such that they are all on the same scale
head.int2<-scale(head.sum2[, !(names(head.sum2) %in% c("subject", "area.sd"))])
rownames(head.int2)<-head.sum2$subject
# plot of BIC values
plot(mclustBIC(head.int2, G = 1:10))Can the classification derived from the outdoor scenes be used to make reasonable predictions about the group membership based on the indoor data set? Here, i’m using the same model as the one described above, but for outdoor space, and using that to make predictions about the group membership of individuals in
How well do these predictions match the mclust results?
Based on the model, the results don’t actually look too terrible. Keep in mind that there were only 21 participants in the data used for training (i.e., a fairly small training set). As a proof of concept, this definitely seems like a promising approach.
# create separate data frame for the classification
head.group<-data.frame(
subject = factor(head.sum$subject),
pred = factor(int.pred$classification),
derived = factor(mc.head.int2$classification),
Freq = 1
)
# construct alluvial graph
ggplot(head.group, aes(y = Freq, axis1=subject, axis2=pred, axis3=derived)) +
geom_alluvium(aes(fill = pred)) +
scale_x_discrete(limits = c("Subject", "Prediction", "Model Classification"), expand = c(.2, .05)) +
guides(fill = FALSE) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
theme_minimal() +
scale_fill_manual(values=wes_palette("GrandBudapest2", 4))+
ggtitle("Prediction vs Model-based Classification for Indoor Viewing")## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.